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Abstract 

The number of co-infections of a pathogen (multiplicity of infection or MOI) is a relevant parameter in epidemiology as it 
relates to transmission intensity. Notably, such quantities can be built into a metric in the context of disease control and 
prevention. Having applications to malaria in mind, we develop here a maximum-likelihood (ML) framework to estimate the 
quantities of interest at low computational and no additional costs to study designs or data collection. We show how the 
ML estimate for the quantities of interest and corresponding confidence-regions are obtained from multiple genetic loci. 
Assuming specifically that infections are rare and independent events, the number of infections per host follows a 
conditional Poisson distribution. Under this assumption, we show that a unique ML estimate for the parameter ().) 
describing MOI exists which is found by a simple recursion. Moreover, we provide explicit formulas for asymptotic 
confidence intervals, and show that profile-likelihood-based confidence intervals exist, which are found by a simple two- 
dimensional recursion. Based on the confidence intervals we provide alternative statistical tests for the MOI parameter. 
Finally, we illustrate the methods on three malaria data sets. The statistical framework however is not limited to malaria. 
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Introduction 

Infections are ubiquitous and ecologically complex processes. 
Indeed the chain of events conducing to the colonization and 
replication of parasites within a host involves many environmental, 
physiological, and genetic factors both in the host and the 
infectious agent. A common observation in many host-parasite 
interactions is that there are multiple genetically distinct lineages of 
the pathogen infecting the same individual host [1-3]. Whereas in 
some diseases such as malaria, this is considered an important 
parameter, in others it is still somehow a neglected aspect that is 
just starting to be considered [2]. 

The observation of multiple genetic variants or multiplicity of 
infection (MOI) is indicative of the transmission dynamics since it 
allows for the co-transmission of different parasite variants or the 
overlap of several genetic variants due to multiple infectious 
contacts. Thus, the incidence of MOI or superparasitism per se is 
an important metric of exposure [2,4—7]. In addition to its 
epidemiological importance, as many other ecological processes 
involving genetically distinct individuals, MOI leads to several 
outcomes derived from the interactions among lineages. This 
process is usually referred to as the intra-host dynamics [3]. 

During the last two decades, the outcomes of intra-host 
dynamics have been the subject of several theoretical and 
experimental investigations exploring a broad spectrum of 
scenarios. Usually, such studies focus on major effects that 
different interconnected factors have in terms of parasite 
dispersion (parasite fitness) and/or the elicited manifestations of 



disease that may lead to an effect on the host's fitness [3,8-11]. 
Furthermore, intra-host dynamics also affect the spread of parasite 
lineages with adaptive mutations conferring resistance to antimi- 
crobial agents or that allow the evasion of immune and/or 
vaccine-mediated protection [12,13]. Under all these circumstanc- 
es, following or measuring MOI as a parameter is essential 
whenever epidemiological inferences or models involving intra- 
host dynamics are formulated. 

Although it is possible to control or measure the number of 
distinctive parasite lineages in models and experimental settings 
(e.g. [14]), a totally different scenario is the one faced by those 
studying naturally occurring infections in the context of ecological 
and epidemiological investigations [4—6,15,16]. Under such 
circumstances, MOI is usually measured by ad hoc metrics that 
rely on a set of genetic markers or the observed polymorphism in 
one or several genes [2] . The need for an experimental definition 
of MOI has generated approaches based on phylogenetic 
frameworks (e.g. many viruses) or some form of multi-locus 
genotyping [2,17]. Whereas such approximations have been 
useful, there is still need for a formal statistical framework that 
allows the estimation of the actual number of lineages and other 
approximations to MOI that facilitates and/or considers con- 
founding factors. 

Given the broad spectrum of genetic architectures observed in 
parasitic organisms, it is not possible to define a universal 
framework of MOI. E.g. HIV accumulates mutations at a rate 
that allows for the use of phylogenetic base methods [17]. On the 
other hand, eukaryotic parasites such as Plasmodium, Trypanosoma, 
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Toxoplasma, and Schistosoma [18,19] and bacteria such as Myco- 
bacterium [16] evolve at a rate at which it is possible to determine 
a stable number of genetically distinct lineages during the course of 
an infection given a set of genetic markers. In this investigation, we 
describe a formal statistical framework to estimate MOI that 
allows, among other aspects, building formal tests for comparing 
groups, e.g., before or after deploying an intervention such as a 
vaccine, complicated versus non-complicated cases, populations 
with different exposures, among other possibilities. 

More specifically, we further develop the maximum-likelihood 
framework introduced by [20], which allows to estimate MOI and 
prevalences of pathogen lineages from a single genetic marker, 
e.g., microsatellite loci. We establish how to compute ML 
estimates and confidence intervals (or regions) for all involved 
parameters. Based on these, we show how statistical tests can be 
constructed to test the parameters. Although, the framework is - in 
principle - not restricted to a particular disease or species, we 
applied it to malaria by comparing data sets from three endemic 
regions with different levels of endemicity. 

The philosophy behind the method section's structure is the 
following. We first establish the general methods and then refine 
them assuming that the number of co-infections follows a 
conditional Poisson distribution. This structure embraces a better 
understanding of how to derive particular results for alternative 
choices to the Poisson distribution. Moreover, rigorous mathe- 
matical proofs are shifted to the appendix. Readers less interested 
in these technical details should feel free to skip them. 

Methods 

We adapt the maximum-likelihood method of [20] to estimate 
the average MOI. This approach is fully compatible with the 
model of [12,21] which describes the hitchhiking effect associated 
with drug resistance in Malaria, for which MOI is a fundamental 
quantity. Being able to estimate MOI, the model can be 'reverse 
engineered' to reconstruct the evolutionary process underlying 
drug resistance. By doing so, a formal means is provided to identify 
those among the many compounding factors, which can be 
influenced to slow-down or prevent the spread of drug resistance 
in the course of public health initiatives. 



m = (m\, . . . ,m„), ( ) : = — is a multinomial coefficient, 

\m J m\ \ . . .m n \ 

and p m : = p™ 1 . . . p™" . Clearly, ill summarizes the pathogen config- 
uration infecting a host. 

In practice, m is unknown for a given host. It is possible to 
detect which alleles (or lineages) are present in a clinical sample, 
but it is difficult to reliably reconstruct m without using next 
generation sequencing, a technology that is not practical to use in 
many settings. For instance, if only a single allele, say A \ , is found 
in a clinical sample, the patient might have been infected by just 
one parasite lineages (m = 1), or co-infected by several lineages 
(m = 2,3, . . .), all of which carry allele A\. Hence, it is convenient 
to represent an infection (lineages detected in a patient) by a vector 
of zeros and ones of length n, referring to the detected alleles 
(lineages). Hence, a clinical sample is represented by a vector 
i = (h, . . . ,i'„)e{0,l}"\{0}, where ik = l if At is found in the 
infection, and otherwise it = 0. In mathematical terms 
/ = sign m = (sign mi , . . . ,sign m„). (Remember sign 0 = 0 and 
sign x= 1 for x>0). Note that the vector =(0, . . . ,0) is excluded, 
which corresponds to no infection. In the following, m will always 
denote a vector of nonnegative integers and i a vector of zeros and 
ones. 

Let m be the multiplicity of infection (MOI) with distribution 
K m . Because k,„ is unknown in practice, we aim to estimate it from 
clinical samples - or rather some summary statistics characterizing 

K m . 

Assume a total of N clinical samples, taken from different hosts 
roughly at the same time. We assume that the n lineages A\,. . .A„ 
detected in the samples are all lineages circulating in the 
population. (There is no knowledge of undetectable lineages.) 
Each clinical sample contains one or more of the n lineages 
(alleles). (We assume that lineages that infected the host have not 
vanished due to intra-host dynamics, e.g., drug treatments, and 
that new lineages have not emerged inside the host, e.g. by 
mutation, recombination etc.) A clinical specimen with allelic (or 
lineage) configuration i could descend from an infection with 
pathogen configuration m as long as sign m = i. Let Q, denote the 
expected frequency of clinical specimen with allelic configuration 
/. Then, 



1 Model background 

Assume n different 'lineages' A\, . . . ,A n of a pathogen, e.g., n 
alleles at a marker locus (or haplotypes in a non-recombining 
region), circulate in a given population. Particularly, we have 
neutral markers in mind characterizing linages, so that their 
frequencies do not change too rapidly, e.g., due to selection. The n 
lineages considered are those that contribute to infection, not new 
variants that are generated by mutation inside hosts, but 'fail' to 
participate in transmission. 

Because we identify a pathogen with the allele at the considered 
locus, we will use the terms 'lineage' and 'allele' synonymously. 
(We refrain from using the term strain, as we refer here to a 
genotypic characterization and the term strain may have different 
meanings across pathogens.) 

In vector notation, the lineages' relative frequencies are 
p = (pi , . . . ,/>„). An individual (host) Ls infected by m (not necessarily 
different) lineages of the pathogen with probability K m . The m lineages 
are sampled randomly from the pathogen population. Hence, within 
an infection, the combination of pathogen linages follows a 
multinomial distribution with parameters m and p\, . . . ,p n . Conse- 
quently, the probability that nik of the infecting linages carry allele Ak 

(171 \ 
\p m , where 



Qi= J2 Km J2 

m=\i\ 



(1) 



sign m = 
\m\=n 



where the first sum runs over all integers larger than or equal to 
|l| : = i\ + . . . + i„, as this obviously is the minimum number of 
parasite lineages that could have caused the infection. The second 
sum runs over all possible configurations in of exactly m parasites 
that lead to the allelic configuration i (i.e. sign m = i), and hence 
could have potentially infected the host. 

It follows, that for a given allele-frequency distribution p, Qi is 
determined by the distribution K m . If infections with the pathogen 
are rare, a natural assumption is that the number of pathogens 
infecting a host is Poisson distributed, or more precisely follows a 
conditional Poisson distribution (CPD), i.e., 



i r 



for m > 1 . 



(2) 



Of note, this conditions on the fact that each host is infected by 
at least one pathogen. The mean value of this distribution is 
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e l -\ 



Assuming the CPD (2), Qi can explicitly be derived. In Analysis 
(subsection 4.1) it is shown that 



e'- — 1 k = i 



infections only. We summarize this in the following remark which 
is proved in Analysis (subsection 4.3). 

Remark 1 If at least one sample contains more than one allele, i.e., 

n 

^N k >N, X = 0 is not the maximum likelihood estimate. 
fc=l 

To obtain the ML estimate for 0 = (X,p\ , . . . ,p n ), (4) needs to be 
maximized on the simplex, either using the method of Lagrange 
multiplies or by eliminating one of the redundant variables, i.e., by 

K-l 

setting e.g., p n = 1 — ^^/>i- When using Lagrange multipliers we 
i— l 

need to find the zeros of the derivatives of 



2 Maximum likelihood 

Consider a total of A samples or clinical specimen, n t of which 

have allelic configuration /. Hence, N= ^~]nj, where the sum 

i 

runs over all zero-one vectors of length n, i.e,. /e{0,l}"\{0} (the 
case of no infection i.e., i = 0 = (0, . . . ,0) is excluded). 

Since the (natural) likelihood for observing these samples is 

n 0' , the log-likelihood is given by 



A(0,/O = L(0)-j3(l-5>), 



(5) 



„ A 8A 8A dA M „ 
i.e., vA=(tt,t — , ■■-.-^ — )^77) = "- The derivatives based on 
81 dpi dp n dp 

the conditional Poisson distribution are derived in Analysis 
(subsection 4.4). The equations VA(0,/J) = O can be straightfor- 
wardly solved by a Newton method, i.e., by iterating 



L= log II log &. 



(3) 



(0, + i,/f, +1 ) = (0».A) + (A0,,Afc). 



(6a) 



Assuming the CPD for the number of lineages infecting a host, it 
is shown in Analysis (subsection 4.2) that the log-likelihood 
becomes 



where (A0 t , A/? t ) is the solution of the system of linear equations 



L = L(X,p) =-N\og(e x -\)+Y^N k log (e** - 1 ) , (4) 

fc=i 

where N k = = lfc/J/ is the number of samples that 

»:ifc = l te{0,l}" 

contain allele A k . The prevalence of allele k is then N k /N. 

n 

Notably, N k > N with equality if and only if exclusively single- 
fc=l 

lineage infections occur. This is one of two special cases that need 
to be treated separately. In the other special case all lineages are 
found in every infection. These cases are somewhat non-generic. 
We shall therefore formulate the following generic assumption. 

Assumption 1 Assume that the sum over the alleles' prevalences is larger 
than one, but not all alleles are 1 00% prevalent. In other words, more than one 

n 

lineage is found in at least one infection, i.e., S ' N k > N and not all lineages 

fc=l 

are found in every infection, i.e., Nt^N for at least one k. 
Results 

In the following X will refer to the parameter of the CPD, or in 
the general case, to the parameter (or parameter vector) 
summarizing the distribution K m . In the latter case A = 0 has to 
be interpreted as fcj = 1 . 

We shall start by deriving the maximum likelihood (ML) 
estimates for the parameters of interest. Before we do so, we shall 
start by a rather intuitive observation. 

Not surprisingly X = 0 can never be an ML estimate if multiple 
alleles are found in at least one sample, as X = 0 implies single 



-VA{0„p t ) = H(0 t ,p t y{A0„AfS,) 



(6b) 



and (0\ ,fi\) is any initial choice of 0 and fi. Here, H{0 t ,fi t ) denotes 
the (transposed) Hessian matrix evaluated at (0 t ,P t ), i.e., 



H(0,p) = 



( o^A 

3A 2 


a 2 A 

dp yd). 


d 2 A 
dpndX 


a 2 A \ 

a^ai 


a 2 A 

dXdpi 


e 2 A 

dp] 


d 2 A 
dpndpi 


a 2 a 

S/SSPI 


d 2 A 

dXdpn 


a 2 a 

dp i dp n 


a 2 a 
°% 


a 2 a 

dfidp„ 


8 2 A 


d 2 A 

d Pl BfS ■ 


a 2 a 

dp n dfl 


a 2 A 
dp ) 



(7) 



If, in the general case, X is a parameter vector, the derivatives 
above have to be interpreted accordingly. 

In the case of the conditional Poisson distribution (2) the entries 
of the Hessian matrix are derived in Analysis (subsection 4.4). 

Clearly, instead of (6) also (0< + i,j8 (+1 ) = (0 ( ,/? ( ) — 

(H(6t$ti)~ m VA(0t,Pt) can De iterated, which, however, is 
numerically less recommendable. Alternative approaches would 
be using an iterative least-square algorithm or the EM algorithm 
(cf. e.g. [22]). 

Of note, in general, an ML estimate does neither necessarily 
exist, nor is it unique, not to mention that closed formulas typically 
do not exist. Unfortunately, assuming the CPD (2), the ML 
estimate indeed cannot be calculated explicitly. However, the 
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estimate exists and is unique. Furthermore, although it can be 
straightforwardly derived by the above methods, the complexity of 
whole procedure can be greatly simplified. 

Result 1 Assume the conditional Poisson distribution (2) for K m . Under 
Assumption 1 there is a unique maximum likelihood estimate 
0 = {X,p\, . . . ,p n ). The first component X is the unique positive solution of 
the equation. 



/+^>g(l-^(l- e - A )) = 0 



N 



(8) 



It is found by iterating 



;,+ ^> g (i-^(i- e - ; -o) 



Xt+\ —kt- 



jt= i 



N 



(9) 



k=l 



Ne^-N k {e'^-\) 



which converges monotonically and at quadratic rate from any initial value 
ki>k. 

The maximum likelihood estimates of the allele frequencies are given by 



p k =-l\og(\-^(\-e- 1 ))- 



(10) 



The result is proven in Analysis (subsection 5.1). 

For the sake of completeness we shall also consider the instances 
in which Assumption 1 is violated. In the first situation, only one 
pathogen lineage is found in each infection, i.e., there is no 
indication whatsoever of co-infections. The results are summarized 
in the following remark which is proven in Analysis (subsection 
5-1). 

Remark 2 Assume that each sample contains only one allele, i.e., 
Nk = N. Then the ML estimates are X = 0 and p k = -rj- ■ 

k=l 

In the other non-generic case that all alleles are found in every 
sample an ML estimate does not exist, more precisely, it is go, 
implying that — with probability one — all alleles are in every 
sample independendy of the allele-frequency distribution. 

Remark 3 Assume N = Nk for all k. Then the ML estimate is 
"X = + oo "for every allelic distribution. 

A proof can be found in Analysis (subsection 5.1). 

Of note, the maximum likelihood has an intuitive interpreta- 
tion. We summarize this as the following result which is proven in 
Analysis (subsection 5.1). 

Remark 4 The maximum likelihood estimate 0 = (k,p\, . . . ,p n ) is the 
set of parameters for which the observed number of samples containing allele A k 
equals its expectations, i.e., 



N k = EN k =N 



1 — e - x P k 



1 Confidence intervals from the profile-likelihood 

Let 0 = (k,p\, . . . ,p„) denote the ML estimate. Confidence 
intervals can be derived from the profile-likelihood for each 
parameter. 

We are interested in finding a confidence interval (CI) for X. For 
a fixed value of X, the profile likelihood is defined as 

L (>) : = maxL(0)= maxL(X,p) 
p v 

i.e., as the maximum likelihood taken over the remaining 
parameters while keeping the parameter of interest fixed. 
Moreover, denote the maximum likelihood by L (clearly 
L = maxtf'). Suppose X$ is the true parameter and the 

corresponding profile likelihood. Then 



2(L-L y o)), 



Xv 



(11) 



i.e. twice the difference of the maximum likelihood minus the 
profile likelihood assuming the true parameter is % 2 distributed 
with one degree of freedom (cf. e.g. [23], chapter 4). This can be 
used to construct confidence intervals for the true parameter ko- 
To construct a CI at the (1 — a) level, we need to find all k 
satisfying 



2(L-L«)<c u _ a , 

i.e., we need to find X<X satisfying 2(L — L®) = 

— c i,i— a, where c njy denotes ot-quantile of the x 
distribution with n degrees of freedom. In other words, the 
equation iP^ — -L + Ci i i_ a /2 = 0 needs to be solved. By definition 
of L {> \ this means that L(X,p) — L + ci i_ a /2 = 0 needs to be 
solved with respect to (X,p), while simultaneously maximizing 
L(k,p) with respect to p. The latter is done using the method of 
Lagrange multipliers for fixed X, i.e., 



A x (p,p) = A(X,pS = L(X,p) -p(\-Y^pd, 



to 



the equations 



is maximized. 

va„ (/>./;) iVV... 

dpi dp n dp 
the bound of the confidence intervals are found by solving the 
following system of equations 



This leads 

-) = 0. Therefore, following [24] 



(L{k, P )-r\ 



f(k,P,P) 



ilpn 



= 0, 



(12) 



Hence, the maximum likelihood maximizes the exj 
likelihood. 



ion of the bg- 



where I* =L — c\ \_ a 

Clearly, f(X,p,[3) = 0 can be straightforwardly solved by a 
Newton method, i.e., by iterating 
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(A, + ij» (+ „A +1 ) = (A tlP( ,A) + (AA„Ap„Aft). (13a) , (^-lX(L-/*)+^ft) 



/ (^.-l)((L-/') + g^/?,) \ 

\ Vm-dc^-d+^J (16a) 

ea nations 8 -gfi i (^-/•X^-l)(^+g J> -l)-gA,(JVA,^-^ft(^-l)) \ flfiW 



where (AA ( , Ap ( , A/? ( ) is the solution of the system of linear 



-f(X„pJ t ) = Wf(l t ,p t ,p t )iAX,Ap t ,AP t ) (13b) where 

and (Xi,Pi,Pi) is any initial choice of 2, p and p. The derivative „ 

V/lX„,p n ,P n ) is identical to (7) except for the first line, which needs A = A(X t ,p t ) = , 

to be replaced by ^' k=i fjpr t ~ 1 



r dL 8L 
K dA ' dpi 



8L BL^ 

' 8p„ ' dp 



(14) 



The derivatives of L are given by (39). Hence, Vf(X,,p t ,p t ) is given 
by 



and 



N k X, 
fi, 



(16d) 



vAx t , Pl ,p t )- 



/ dA 

Jx 

8 2 A 


dA 
dpi 
d 2 A 


dA 

dp„ 

8 2 A 


0 

8 2 A 


dXdpi 


dp 2 ■ 


dp n dp\ 


dpdpi 


8 2 A 


d 2 A 


8 2 A 


8 2 A 


dldp n 
d 2 A 


dp\dp n 
d 2 A 


' Sp 2 n 

d 2 A 


dpdp n 
8 2 A 



\ dXdp dpic 



(15) 



dp n dp dp 2 ) 



where all derivatives are given by (39) and (40). 

Again, alternatively (X, + i,p t+l ,P t+l ) = (X,,p l ,p t )-{Vf 
(). t ,p t ,P,))~^ -f(X,,p t ,P t ) can be iterated, which however requires to 
invert the matrix Vf{X,,p t ,P t ) in every iteration step. The 
alternatives to the Newton method are again the EM algorithm 
or an iterated least-mean-square algorithm. 

To obtain the confidence bounds X and X it is necessary to 
iterate (13) from two different initial values. Of note, obtaining one 
bound for the confidence interval is numerically only as 
demanding as obtain the ML estimate. 

Confidence intervals for the allele frequencies pi are obtained 
similarly by iterating (13) with obvious changes. Namely, the first 

8A(X,p,P) 



component of the function / needs to be replaced by 



dx 



and the l)-th component by L{X,p) — T, i.e., / is the gradient 
of A with the derivative with respect to pi replaced by L(X,p) — l*. 
Consequendy V/ is identical to (7) with the (/+ l)-th component 
replaced by (14). 

Importandy, existence and uniqueness of the confidence bounds 
X and X can be proved under the assumption of the CPD (2). 
Moreover, it is possible to significandy reduce the complexity of 
the Newton method (13) to find the CI's bounds. We obtain the 
following result, which is proven in Analysis (subsection 5.2). 

Result 2 Suppose Assumption 1 holds. If K m is given by the conditional 
Poisson distribution (2), the confidence interval for X (based on the profile 
likelihood) is uniquely defined. 

The bounds of the confidence interval (X and X) for X are obtained by 



L = L(X„p<) = - N log (e A ' - 1) - £ N k log ( 



k=\ 



P, 
N k X, 



1). (16e) 



There are exactly two possible solutions (X,P) and (X,P). The algorithm is 
converging quadratically for any initial values (X\,P\) sufficiently close to the 
one of the solutions. 

The proof is found in Analysis (subsection 5.2). 

Formally, the above result holds true in the non-generic 

n 

cases ^^Nk^iN and Nk=N. If all samples contain just one 
fc=l 

n 

lineage, i.e., ^^A r / t>A r , the ML estimate is i = 0 and the 
fc=l 

confidence interval has the form [Q,X]. If all samples contain all 
lineages, i.e., N/ C = N the maximum likelihood estimate is 
X = co and the confidence interval has the form [X, + oo), hence 
it is infinitely large. Although, formally the result still holds, the 
asymptotic (11) is no longer true, as discussed in Analysis 
(subsection 6), rendering the result inapplicable if Assumption 

1 is violated. 

2 Asymptotic confidence intervals 

As an alternative to the profile likelihood, one can use the 
asymptotic normality of the maximum likelihood to construct 
confidence intervals. Asymptotically the difference of the 
maximum likelihood (0 = (X,p)) and the true parameter 
(00 = (Xo,p 0 )) is normally distributed. However, it is important 
to notice that - unless one eliminates one of the redundant 
allele frequencies - the Lagrange multiplier P needs to be 
treated like a regular parameter. The corresponding likelihood 
function is of course given by (5). Hence, the actual parameters 
involved are 9 = {0,P). The difference of the maximum 
likelihood (9>) and the true parameter (#o) is asymptotically 
distributed according to 



(17a) 



PLOS ONE | www.plosone.org 



5 



July 2014 | Volume 9 | Issue 7 | e97899 



Likelihood Estimate of Number of Co-Infections 



(A-*o)~JV(o,V(5)), 



(17b) 



where I^(&)= — E( — 2^5=5 * s t ^ ie ex P ecte d Fisher information 



5* 



2 l«=S ^ s tne observed Fisher information (based 



and Jn(9) 
on sample size N). The matrix — 



89 2 



9= jl is the transposed 



Hessian matrix given by (7). 

The expression (9 — 5rj) ~Af(0,I^ l (9)) is the convenient, 
although imprecise notation, for (7at($)) 5 ($ — #o) ~A/"(0,Jl), where 

I is the (n + 2)-dimensional identity matrix and (/a'(^)) 5 the 
symmetric square root of the Fisher information. Namely, any 
positive semi-definite, symmetric matrix A (as it is the case of any 
covariance matrix, and particularly the Fisher information) has a 
spectral decomposition A = ODO T , where O is orthogonal and D 
is the diagonal matrix that contains all eigenvalues. These are real 
and nonnegative, and the diagonal matrix that contains the square 

roots of the eigenvalues is denoted by D 5 . Hence, by setting 
A^ = Orho T , we have A=A^A^. 
An often used alternative notation is 



observed Fisher information are identical, i.e., J(9) = 1(9) = , 
when assuming (2). 

With some algebraic manipulation it is possible to simplify the 
expressions for the confidence intervals assuming the CPD (2). 

Result 3 Suppose the number of co-infections follow the conditional 
Poisson distribution (2) and that Assumption 1 holds. Then an asymptotic 
(1 — a)- confidence interval for X is given by 



*i 1) 

A+- 



\ 



(20) 



-Ne x (l+- 



k=\ 



-) 



Alternatively, the following formula, requires just the ML estimate for X 
zi-i^-l) 

X± , 2 (21) 



-Ne x 



\ 



1- 



E 



\ k=l eHN-N k )-N k J 



s/N(9-9 0 )~M(0J~ l (9)), 

or 



s/N(9-9 0 )~Af(0,J- l (9)) 

with 1(9) = jrI N (9) and J(9) = jzJ N (9). 

From (17) the asymptotic distribution of the parameters of 
interest 9 follows immediately by dropping the 'dummy' variable /? 
and the corresponding rows and column in the inverse Fisher 
information. Of note, this is not identical to 'formally' derive the 
inverse Fisher information based on L and 0. Namely, it is 
important to drive the asymptotic covariance matrix with respect 
to A and 9. 

Since (9-9 0 )~jV(0,J ff 1 (9)) the bounds for the (1-ot) CI for 
/to are given by 

X± Zl _^(J N \9)) n (18) 
and those for the components of/> 0 by 



For a proof, see Analysis (subsection 5.3). 

In the non-generic case = N for all k, the ML estimate is not 
unique, and we have X = +oo. Hence, asymptotic CIs make no 
sense in this case, neither for X nor for the frequencies pt- 

n 

In the case = N, it also impossible to derive CIs as the 

k=\ 

asymptotics (17) break down (cf. subsection 6 in Analysis). 

Explicit formulas for the CIs of the allele frequencies are 
obtained similarly. 

Result 4 Under the same assumptions as Result 3, an asymptotic 
(1 — a)- confidence interval for pj is given by 

















• A 

-n+ e Pk ' 
k=\ 1 
k*j ) 




-Pj\ 1 




(e XP J-D 2 








\ 


e kp i-\ 

1 


i 


1 " u 

\-n+ e Pk 
\ k k*) ) 


I" 


-V 2 



(22) 



p k ±z l _^(J N l (9)) k+u+l . (19) 

Here, z a denotes the a quantile of the standard normal 
distribution. 

Of course, when using the expected Fisher information, Jff 
needs to be replaced by In. Under the assumption of the 

8 2 A 

conditional Poisson distribution (2), the second derivatives — =- 

39 2 

needed to derive the Fisher information are calculated in Analysis 
(subsection 4.4; eq.39). Moreover, evaluated at the maximum 
likelihood estimate, N k = ENk, it is seen that the expected and 



The proof can again be found in Analysis (subsection 5.3). 

3 Testing the parameters 

In practice, data from several loci is typically available, each of 
which yields a different ML estimate or there might be some prior 
estimate for the parameters of interest. Depending on particular 
properties of the marker loci (mutation rate, allele-frequency 
spectrum, biochemical issues in determining motif repeats, etc.) 
different marker loci will lead to different ML estimates. Hence, it 
is desirable to test whether different estimates are significandy 
different. The confidence intervals can be adapted to test the 
parameters. 
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Clearly, at different marker loci, different alleles will segregate 
and the allele-frequency spectra will be very different. Hence, for 
the present purpose, it is meaningless to compare the allele 
frequencies at different loci. However, the estimate for X should be 
consistent, as this parameter is the same for all loci. Consequently, 
in the following we will focus on testing X and present three 
alternative tests for the null hypothesis Ho : X = Xq vs. the 
alternative H\ : X^£Xq. 

3.1 The likelihood-ratio test. The first test is rather 
straightforward. Since 



(23) 



under the null hypothesis X = Xq, it is rejected at significance level a 
if 



2(L-L«)> C1 , 1 . 



In other words, we reject the null hypothesis for any 2o that lies 
outside the ( 1 — (z)-confidence interval of X, which are obtained as 
outlined above in "Confidence intervals from the profile 
likelihood". Therefore, this test requires no additional numerical 
effort if the confidence intervals were already derived. 

The corresponding p-value is given by 



2(X-zW) 



e 2 
1 2xii 



dx. 



(24) 



column and substituting X = Xq, i.e., 



/ d^A 

dp 2 



d 2 A d 2 A \ 



dp„dpi dfidpi 



8 2 A 


8 2 A 


d 2 A 


dp i dp n 


' dpi 


dpd Pn 


d 2 A 


d 2 A 


d 2 A 


{ dpidp ■ 


• d Pn dp 


W j 



(27) 



where all derivatives are given by (39) and (40). 

Result 5 Suppose Assumption 1 and Xq # 0 holds. In the case of the 
conditional poisson distribution, the p-value under the null hypothesis 
Ho : X = Xo is given by (24), where ZP°' is given by (4) with X = Xq 
and Pq given by 



Pk- 



1 , /, N k Xo 



(28) 



where ft is the solution of(l6e) with X = X> . 
The solution ft is found by iterating 



( 1 , l V^i t\ N/cXo \ 



V 



N k Xo i 



(29) 



/ 



To calculate the p-value, needs to be derived first. 

Similarly as in section in "Confidence intervals from the profile 
likelihood", this leads to the equations VA^ (/>,/?) = 

( - — , • • • , -z — , ^ttt) = 0. Therefore, the system of equations 
dpi Cp n dp 



/ dA(X 0 ,p,/3) \ 
dpi 

dA(X 0 ,p,P) 

dp n 

dA(X 0 ,p,P) 
V dp ) 



= 0 



(25) 



needs to be solved by a Newton method, i.e., by iterating 



(p, + a,+i)=(Pt>p,)+(.Ap,M,), 



(26a) 



where (A/> t ,A/? t ) is the solution of the system of linear equations 



-g^M)=^(p„m^p t Ap t ) (26b) 

and (Pi,Pi) is any initial choice of 0 and /?. The derivative 
^S^iPfPt) i s obtained from (7) by deleting the first row and 



The proof is presented in Analysis (subsection 5.4). 

In case of Hq : X = Xq = 0, there are two possibilities. If 

n 

Nk > N, then L(0,p) = — go . Hence, the null hypothesis is 

fc=l 

always rejected. This is clear, because if Xq = 0 is the true 

n 

parameter, it is impossible to observe data X with N^>N (see 

k=l 

n 

Remark 7 in Analysis, subsection 6). However, if N/ C =N, then 

k=l 

X = 0 and L(Q,p) = L, and the null hypothesis is always accepted. 

Therefore, in the case of X$ = Q the test can still be formally 
performed in a meaningful way. However, note that the 
asymptotic (23) does not long hold true, as Ao = 0 does not lie in 
the interior of the parameter space. 

3.2 The score test. In the following, for any parameter 

choice Xq, let p 0 by the corresponding profile-likelihood estimate, 

i.e., p 0 = argmax L()^,p), where S n is the («— 1) dimensional 
peS n 

simplex. By using a dummy variable as before, p 0 is obtained from 

Mo '■ = (Po>Po) = argmax A(Ao,/»,/J). The Fisher information can 
(p./?) 

be written as 



In(Xo,Po) z 



'Vo 



A 0fQ 



where is obtained from the Fisher information with the first 
row and column deleted. The definitions of the remaining sub- 
matrices follow accordingly. 
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A test for the null hypothesis Ho : X = Xq vs. the alternative 
H\ : X # Xq is obtained by using the fact that 

(cf. Remark 6 in subsection 5.4 of Analysis). The function 



BL 



T{X)- 



81 u o-fo 



(31) 



serves as test statistic, where the data is X = (N\, . . . „N„). The test 
rejects Ho at the a-level if T(X)j£[ — Zj_»,Zi_i] The correspond- 
ing p-value is 2(1 -®QT(X)\)). 

Note that it is legitimate to write L on the left-hand side of (30) 
dL 8A 

because = tt . However, it is nevertheless important to derive 

dX OA 

the asymptotic variance from A. 

Alternatively, the expected Fisher information I(Xo,flo) in (30) 
and (31) can be replaced by the observed Fisher information 
J(Aofio)- However, if Xo is not the ML estimate, 
I{Xo,p.o)¥^J{Aofio)- As proven Analysis (subsection 5.4), one 
obtains for the CPD: 

Result 6 Consider the score test for the null hypothesis Ho : X = Xo vs. 
the alternative H\ : Xj^Xo under the assumptions of Result 5. The test 
statistic based on the observed Fisher information is 



t=Vn- 



E 



e x ° - 1 fr[ A e^Pk - 1 



\ 



2 :2Z^ M I 1 



(^o-l) 2 X 2 0 fi N 

n 

(Xo + n-Y^e 1 ^) 1 

J k=i 

Xl "(/oPk-lf 



e hiPk — 1 



(32) 



+ 



and that based on the expected Fisher information is 



information as they coincide (cf. section "Asymptotic confidence 
intervals"). 

In summary one obtains: 

Remark 5 Under the assumptions of Result 6) a test statistic for the null 
hypothesis Hq : X = Xo vs. the alternative H\ : X # Xo is 



e A 0 

7o~\ 



y^ N k Pke A ° Pk 



T=VN(e*-l)- 



k=\ 



N e x oPk - 1 



\ 



(34) 



1 + 



V 



fc=i 



where N and n are sample size and number of alleles, in the data 
yielding the estimate X. 

The proof is analogously to the one of Result 6. 

n 

The test cannot be applied in the special cases Nk = N or 

k=l 

Nk=N for all k, as the asymptotic (30) no longer holds true (cf. 
subsection 6 of Analysis). 

3.3 The Wald test. A third test for the null hypothesis 
Ho : X = Xo is an adaptation of the Wald test for the profile 
likelihood. It is based on the same asymptotic properties 
that we used to derive confidence intervals namely 
(9 — $o) ~ A/"(0,7j^ 1 (9)). This is exactly the same as the asymptotic 
9-9 0 ~M(»,Jn\»)) as J N (9)) = I N (9)). 



This implies (X-X 0 )~Af(0,(I^ 1 (9)) n ) or 



(X — Xo)~Af(0,\). Hence, the test statistic 



1 



\Avw)i, 



T(X) = 



X — )uQ 



can be used. The p-value is 2(1 — <&(\T(X)\). 

Now, we shall consider again the CPD. An explicit expression 
for (I N '($))! i is given by (54). Hence, we obtain: 

Result 7 Under the assumptions of Result 5, the Wald test for the null 
hypothesis Hq : X = Xq vs. the alternative H\ : X # Xo has the test statistic 



t=Vn 



Jo - eV* — 

f-^ N Fk e A oPk - 1 







( \ 






e x o 


1+ 








n-Y^OPk 




\ 









(33) 



The p-values are 2(1 —<S>(\T(X)\)) in either case. The frequencies 
Po = (p\ , • • • ,p n ) are derived as specified in Result 5. 

Of note, instead of (30) the ML estimate can be used as 
a plug-in estimate for the asymptotic variance, i.e., 
dL 

^k^~ N ( 0 ' I u- I fa I hhW>)- In thls case ' 11 15 not necessar y 

to distinguish between the expected and observed Fisher 



T(X) = (X-X 0 ) 





( \ 


-Ne l 


1 1 


e x -\ 






v h J 



(35) 



based on the (expected or observed) Fisher information. 

The p-values are 2(1 — 0(| T(X)\) in either case. Here, X and the 
frequencies p 0 = (p\ , . . . ,p n ) are derived as specified in Result I. 

Alternatively, if the profile-likelihood estimate based on Xo is 
used as a plug-in for the asymptotic variance, one can employ 

(3-3 0 )~A/X0,J^ Wo)) or (9-9 0 )~Af(OJ^ l (X 0 ,fi 0 )). 

In the first case, using (53) implies that the test statistic changes 

to 
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T(X) = (X-lo) 



-Ne x o 
(cfo - 1) 2 /; 



o k = 1 



y e ^0Pk - 1 



(Io + m-^Vo^) 2 



+ 



(36). 



k=\ 



^ ^ (e ; o« - l) 2 



E 



In the second case, (54) implies that the test statistic changes to 



T{X) = {X-h) 



e x o-l 



1 



(37) 



Also the Wald test cannot be applied in the special cases 

n 

Nk = N or Ni c = N for all k, as the asymptotic for ($ — #o) no 

/t=l 

longer holds true (cf. subsection 6 of Analysis). 

4 Testing the method 

Although - as we have seen - most of the theory works quite 
general, assuming a CPD for the number of co-infections permits 
to derive explicit results or, at least, reduces the complexity 
significantly. However, assuming a CPD might not be justified. 
Therefore, it is desirable to have a test for the model's fit. Namely, 
let 



L N = n i log 

fe{0,l}"\{0} 



be the likelihood assuming a perfect fit to the data, in which the 
expected frequencies of infection with stain configuration i equal 
their observed frequencies. In other words, Lfj is the maximum 
likelihood of the saturated model. As there are 2" — 1 possible 
allelic configurations i infecting a host, JLy has 2" — 2 degrees of 
freedom. The maximum likelihood L = L(l,p) of the reduced 
model (assuming the CPD) has n — 1 independent allele frequen- 
cies and one Poisson parameter. Therefore, 



2{L N -L)> 



(38) 



Hence, the following test can be used. 

Result 8 To test Hq : "the conditional Poisson distribution is justified" 
vs. H^: "the conditional Poisson distribution is not justified", the test-statistic 
T(X) = 2(Ln—L) can be used. The p-value is given by .x^„_ n _ 2 (T(X)) 

It should be mentioned that the above test might perform poorly 
if the number of lineages or alleles n is large. The reason is that the 
X 2 distribution has too many degrees of freedom. This might be 
the case when using hyper-mutable microsatellite markers with 1 0 
or more alleles found across samples. 



Application to data 

As an illustration, the methods are applied to three previously- 
described data sets [25-27]. Each of which comprises molecular 
data from P. falciparum-infected blood samples from endemic areas 
with different levels of malaria incidence. For each blood sample, 
parasite DNA was extracted and several microsatellite markers 
assayed. 

1 Preliminary remarks 

It is important to note beforehand that only (selectively) neutral 
markers should be included in the analysis. Namely, loci linked to 
others that are targets of selection (e.g., mdrl, crt, dhfr, dhps in P. 
falciparum that are associated with selection for drug resistance) will 
have skewed allele-frequency distribution. Hence, using these 
markers might lead to artifacts and severe misinferences. In 
practice, a marker located on a chromosome not carrying a 
strongly selected gene (e.g. resistance-conferring gene), can be 
regarded to be neutral. Moreover, clinical samples from groups 
that will be compared need to consider confounding effects such as 
differences in treatment polices, control interventions, and 
changing transmission intensities (e.g., a group should not contain 
samples from two time points during which treatment policies 
changed). By not considering such effects, the estimates of MOI 
would be inappropriate. For these reasons, we only used parts of 
the available data sets. 

2 Data description 

The first data set emerged from a longitudinal study conducted 
in Asembo Bay, a hyper-endemic region in Kenya, and was 
described in [27]. We included five (neutral) microsatellites on 
chromosome 2 and four (neutral) markers on chromosome 3. 
Additionally, we included two markers on chromosome 8, quite 
close to dhfr, which are common to all three data sets and meet 
Assumption 1 . Only blood samples collected in the first study year 
(mid 1993 to mid 1994) were included, resulting in 42 blood 
samples. 

The second data set described in [26] is from a study from 
Yaounde, Cameroon, a region of intermediate/high transmission. 
Besides the two markers on chromosome 8 mentioned above, we 
included all eight available (neutral) microsatellite markers on 
chromosomes 2 and 3 from all 331 blood samples (data of one of 
the 332 original samples was unavailable). 

The third data set is from Bolivar State, Venezuela, a region of 
low transmission. It was described in [25] and consists of 97 blood 
samples. Due to the low transmission intensities, for most markers 
each blood samples contains only one allele, violating Assumption 
1. We included all markers that met Assumption 1 as well as all 
available neutral markers. Particularly, we included four on 
chromosome 2 and three on chromosome 3, two markers on 
chromosome 8 and one on chromosome 4, which are sufficiently 
distant from respectively dhps and dhfr to be considered neutral, 
and the two makers on chromosome 4, which were also included 
in the other data sets. All 97 blood samples were used. 

3 Results 

The results are summarized in Figures 1 and 2 and Tables 1-3. 
In all cases, the test for the model fit (cf. Result 8) justified the 
assumption of the CPD (cf. Tables 1-3). This is important because 
the three locations exhibit different transmission intensities. In all 
three regions, the ML estimates X or rather the mean MOI, 
le l 

—. , obtained from different marker loci are fairly consistent. 
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Figure 1. Shown are the ML estimates 



(dots) and their 



respective profile-likelihood-based (blue) and asymptotic 
(green) CIs for the data from Kenya (A), Cameroon (B) and 
Venezuela (C) for several microsatellite markers each. 

doi:1 0.1 371 /journal.pone.0097899.g001 

As expected, most variation in the estimates is observed in Kenya 
because of the low sample size. Moreover, the transmission 
intensities are stronger, which leads to more variation in allele- 
frequency spectra among marker loci, resulting in more variation 
among the ML estimates. 

From Figure 1 it is apparent that the estimates for MOI are 
highest in Kenya, followed by Cameroon, whereas they are very 
low in Venezuela. This is summarized in Figure 2 showing that the 
average ML estimates across the regions differ by several standard 
deviations. 



1.4- 



1.3 



1.2 



1.1 



1.0- 



Kenya 



Cameroon 
region 



Venezuela 



Figure 2. Average ML estimates by region. Averages are the 
arithmetic mean of the ML estimates + 2 standard deviations derived 
from the microsatellite loci, which are common to all data sets, 
including (blue) and excluding (green) locus LI, which appears to be 
hyper-mutable in Kenya and Cameroon. 
doi:10.1371/journal.pone.0097899.g002 



The 95% profile-likelihood CIs for 



■1 



given by 



1 



1 



are reasonably large for the data sets from 



Cameroon and Venezuela (cf. Figure 1). However, due to the 
relatively small sample size, they are much less informative for the 
Kenya dataset. 

The asymptotic confidence intervals agree well with the profile- 
likelihood CIs (cf. Figure 1 and Tables 1-3). This is particularly 
true for Cameroon, as expected because of the large sample size. 
The profile-likelihood CIs from the Kenya and Venezuela data are 
asymmetric while, the asymptotic CIs are - by definition - 

symmetric (however, the transformation x\— » — - — - results in some 

asymmetry). (Note that, unlike profile-likelihood-based intervals, 
asymptotic CIs are not transformation respecting, i.e., 



Xe A - 



— -,— is the transformed CI of/, not the CI of — .) 

■ei-lV-l e*—l 
In relative terms, this is more pronounced in Venezuela than in 

the Kenya data set. The reason is that the ML estimates (1) from 
the Venezuela data are close to zero, i.e., the boundary of the 
parameter range. This results in a very skewed likelihood function, 
yielding quite asymmetric profile-likelihood CIs. On the contrary, 
in Kenya, the ML estimates are rather large, and the likelihood 
function tends to be symmetric around its maximum. 

Furthermore, we tested for pairwise differences between the 
estimates based on different marker loci. Tables 4-6 report the p- 
values for the likelihood-ratio, the Score, and the Wald test for the 
three regions. In all data sets, all tests perform equally well. There 
are some discrepancies, mainly due to the above mentioned 
skewness of the likelihood function. In the case of a skewed 
likelihood function, the likelihood-ratio test is the most preferable, 
because it accounts for the skewness. 

Tables 7-9 compare the three versions of the Score test, while 
Tables 10-12 compare those for the Wald test. The results are 
fairly consistent. However, the versions given by eqs. 34, 37 and 36 
of the Score and Wald tests, respectively tend to be most 
inconsistent with the other tests, especially the likelihood-ratio test. 
The reason is that these use the roughest approximations. 

Overall, the methods perform well for all data sets and provide 
meaningful results. However, the statistical tests also yielded 
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Table 1. Estimates for each locus of the data set from Kenya. 



locus 


lower bound 


).e l 
e A — 1 


upper bound 




d.f. 


U7 


1.00194 


1 .03409 


1.15244 


6.40471 


9 




0.968395 




1.10265 






L5 


1.21506 


1.38975 


1 .64696 


67.8528 


15 




1.18622 




1.61235 






J3 


1.16387 


1.32208 


1.56625 


44.993 


16 




1.1331 




1.52876 






J6 


1.13457 


1.27344 


1.49108 


58.3296 


15 




1.10558 




1.45595 






U6 


1.15044 


1.29506 


1.51735 


65.1444 


14 




1.12211 




1.48319 






L4 


1.18509 


1.34319 


1.57899 


89.2578 


18 




1.15735 




1.54568 






U5 


1.16453 


1.31318 


1.53811 


76.1215 


20 




1.13692 




1.50489 






K6 


1.31334 


1.51443 


1 .7943 


1 34.024 


26 




1 .28687 




1.76291 






LI 


1.3654 


1.59303 


1 .90742 


87.4142 


16 




1.33699 




1.87367 






c4 


1.15248 


1.30977 


1.55585 


15.9715 


7 




1.12049 




1.51705 






b3 


1.06529 


1.16656 


1 .34475 


34.7327 


16 




1.03537 




1.30777 







Each row shows, locus name, lower profile-likelihood (top) and asymptotic (bottom) confidence bound, ML estimate, upper profile-likelihood (top) and asymptotic 
(bottom) confidence bound. For the confidence, bounds a = 0.05 was assumed. Moreover, the test statistic for the fit of the CPD (2) is shown as well as the 
corresponding degrees of freedom. In all cases, the outcomes are not significant, suggesting that the assumption of the CPD is justified. 
doi:1 0.1 371 /journal.pone.0097899.t001 



significant differences in some of the pairwise comparisons of the 
various X estimates in each region (Tables 4-12). The allele 
frequencies differ of course but all are based on the same true 
parameter X. If the estimates for X are significantly different, some 
of them cannot be trusted. This can have various reasons. First, it 
can be a type I error. However, this occurs only with small 
probability if the CIs are well calibrated, i.e., their nominal 
coverage (1 — a) is close to the actual coverage. Asymptotic CIs and 
tests based on them (Wald, Score) will be more affected than 
profile-likelihood-based intervals, because the former are inher- 
endy forced to be symmetric. This is particularly true if the 
estimates for X are close to zero. To quantify this effect, and to 
suggest heuristic methods to recalibrate the CIs, a systematic 
numerical robustness study of the approach is planned. Prelim- 
inary investigations, however, have shown that particularly the 
profile-likelihood-based CIs are well calibrated. 

Second, the tests are designed to compare the ML estimate 
based on the data with a value An, which has to be interpreted as 
prior knowledge. Strictly speaking, it is not meant to be estimated 
from data itself, or at least data which is available. A test designed 
to compare two estimates, should incorporate information from 
both data sets (data from both markers). A standard approach to 
resolve this is as follows. One could calculate the product of the 
maximum likelihood from both markers and compare it with the 
maximum likelihood of both markers conditioned on equality of X. 
This however would require much more numerical effort than the 
tests here. Note further, that the structure of the data does not 



allow to perform a permutation test, because the allele-frequency 
distributions are expected to be different. This is true for two 
different marker loci in the same endemic region as well as for the 
same marker in two different populations. 

Third, the model assumptions might be violated, i.e., the 
underlying Poisson distribution might not be correct. This can 
again be quantified in the coarse of a robustness study. 

Fourth, the allele-frequency spectra of two different marker loci 
is very different, and the method might be sensitive to this. For 
instance strong skewness in the data distributions might bias the 
estimates. This is obviously the case if one marker shows no 
variation at all. Moreover, the number of different allele at 
different markers is very different, which results in very different 
probabilities of the ML estimates. These issues again need to be 
investigated in a numerical study. 

Fifth, some STR markers tend to be hyper-mutable. As a result, 
not just the frequency distribution might be more problematic, but 
it is also more challenging to correctly identify the tandem repeat 
numbers. Hence, for hyper-mutable markers the data might have 
very bad quality. In our examples the marker labelled LI appears 
to be hyper-mutable. 

Because of all these possible reasons, it would be pre-mature to 
suggest a heuristic on how to decide, which estimates can be 
trusted the most. A systematic numerical follow-up study is 
planned to investigate all these possibilities in detail to provide 
suggestions on the criteria upon which the data is chosen. 
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Table 2. See description of Table 1 but for the Cameroon data set. 





locus 


lower bound 




upper bound 




d.f. 


L5 


1.12239 


1.17804 


1.23538 


165.239 


27 




1.12754 




1 .24098 






J3 


1.11596 


1.17407 


1 .23404 


105.218 


26 




1.12171 




1 .24032 






J6 


1.15263 


1.21385 


1 .27704 


178.18 


25 




1.15774 




1.28258 






U6 


1.17975 


1.23815 


1 .29829 


270.763 


32 




1.18389 




1.30274 






L4 


1.17469 


1 .24032 


1.30817 


222.664 


29 




1.17986 




1.31378 






U5 


1.18476 


1.25169 


1.32089 


195.916 


24 




1.18987 




1.32643 






K6 


1.18436 


1.24819 


1.31408 


294.437 


40 




1.18908 




1.31919 






LI 


1 .28997 


1.36794 


1.44861 


332.781 


40 




1.29451 




1.45349 






c4 


1.08125 


1.20363 


1.33427 


0.958866 


9 




1.10312 




1.36155 






b3 


1.1223 


1.18418 


1.24816 


75.4321 


27 




1.12849 




1.255 







doi:1 0.1 371 /journal.pone.0097899.t002 



average MOI. Assuming a CPD, the likelihood function simplifies 
as well as the procedure to derive the ML estimates. Although, this 
was previously described by [20], we were able to derive a number 
of important results: First, the ML estimate always exists and is 
unique. Second, it has the intuitive interpretation of being the 
parameter vector under which the observed are the expected 
prevalences for the distinguishable lineages, i.e., the observation is 
the expectation, if the ML estimate is the true parameter vector. 

Third, the recursion to compute the ML estimate for X reduced 
from a multi- to a one-dimensional recursion, which just depends 
on the number of samples N and the observed prevalences. The 
ML estimates for the lineages frequencies are explicit functions of 
X. Fourth, the recursion for X converges (at least) from every initial 
value Xq > X. Convergence is monotonically, at quadratic rate, and 
typically occurs within a few iterations. Besides the obvious 
computational advantages provided of our results their actual 
foremost importance is that they justify the ML approach. Using 
an ML estimates is only appropriate if it has a significantly higher 
probability than distant alternative parameter choices, which is 
difficult to evaluate in a multi-dimensional space. However, the 
form of the ML estimate here - particularly because the lineages 
prevalences depend continuously on X - indicates that the 
observation will have significantly lower probability under distant 
alternative parameter choices. The method worked well for the 
three malaria datasets to which it was applied, and gave similar 
results when applied to different independent microsatellite loci. 

Although, our results justify the ML approach, it is nevertheless 
of fundamental importance to provide confidence intervals (CIs). 
We reported here on asymptotic and profile-likelihood-based CIs 
for all parameters. Asymptotic CIs are either based on the 
observed or the expected Fisher information, which under the 



Discussion 

The number of genetically distinct lineages co-infecting a host - 
commonly referred to as "multiplicity of infection" (MOI) - is a 
key quantity in epidemiology. First, it relates with transmission 
intensity since it provides a metric for the number of secondary 
infections after a primary infection; assuming that the lineages 
circulating are identifiable (e.g. secondary infections within a 
clonal outbreak simply cannot be traceable). Second, it measures 
the possibility of genetic exchange among those lineages as 
determined by the genetic system of the pathogen in question. 
Finally, if phenotypic differences are associated with those 
lineages, MOI could lead to very complex dynamics driven by 
natural selection. 

Measuring MOI is desirable in a variety of infectious diseases, 
but - in many instances - only feasible if it can be measured at low 
cost and with a reasonable effort. Optimally it should fit into 
standard study designs and should be easily computable with 
whatever genotyping data can be collected from clinical 
specimens. In order to meet these goals, we further developed 
the maximum-likelihood (ML) method originally proposed by [20] 
and applied it to three malaria datasets as examples. 

From a total of N samples (e.g. blood samples), the number of 
genetically distinguishable lineages present in each host are 
recorded. From the resulting data, assuming that hosts are 
infected randomly by those lineages according to their prevalence, 
we derived the likelihood function. If infections with the pathogen 
are rare events, a natural choice for the number of co-infecting 
lineages is a conditional Poisson distribution (CPD). This 
distribution comes with the appealing feature that it is character- 

Xe x 

ized by a single parameter X, whose transform — is the 



PLOS ONE | www.plosone.org 



12 



July 2014 | Volume 9 | Issue 7 | e97899 



Likelihood Estimate of Number of Co-Infections 



Table 3. See description of Table 1 


but for the Venezuela data set. 








locus 


lower bound 


e x — \ 


upper bound 


2U*r£i) 


d.f. 


J3 


N/A 




N/A 


N/A 


N/A 




N/A 




N/A 






J6 


N/A 


1 


N/A 


N/A 


N/A 




N/A 




N/A 






U6 


N/A 


1 


N/A 


N/A 


N/A 




N/A 




N/A 






L4 


N/A 


1 


N/A 


N/A 


N/A 




N/A 




N/A 






U5 


0.974273 


1 .02745 


1.08251 


8.32780 


8 




1 .001 56 




1 .1 2327 






K6 


f) 971 OR"? 

U. _/ / I uoz 


1 03104 


1 09339 


8 nfifii n 

O.UUU 1 \J 


3 




1 001 76 




1 1 3908 






L1 


U. _70*T JZ.U 


1 04242 


1 10251 


n onoon 

U .\J\J\J\J\J 


2 




1 00703 




1 13188 






c4 


0 973367 


1 02863 


1 08592 


9 79400 


3 




1 001 63 




1 1 273 






b3 


0.99278 


1.06223 


1.13479 


3.66900 


4 




1.01538 




1.16345 






fr13 


0.981231 


1.05152 


1.12504 


0.20579 


3 




1 .00852 




1.16137 






ps6 


0.98346 


1 .04538 


1.1098 


0.00000 


2 




1.00752 




1.14139 






ps7 


0.978848 


1.02256 


1.06754 


1.01430 


4 




1.00128 




1.10032 









N/4 indicates that that the method is not applicable (cf. Analysis, section 6). 
doi:1 0.1 371 /journal.pone.0097899.t003 



CPD coincide. Explicit formulas for the CIs for all involved 
parameters were derived. Profile-likelihood based CIs were 
already emphasized by [20]. However, it was important to note 
that they can actually be derived at low numerical costs by using 
the method of Lagrange multiplies. This reduces the numerical 
effort to the same magnitude as for the ML estimate. Assuming the 
CPD, we proved that the CI for the parameter X, yielding the 
estimate for the MOI, is uniquely defined. The confidence bounds 
are derived by a two-dimension recursion, which converges locally 
at quadratic rate. Both kinds of CIs gave meaningful results for the 
three data sets to which we applied the methods and they agree 
well. Although the asymptotic CIs are easier to derive, we suggest 
to use the profile-likelihood-based CIs if sample size is low and/or 
the ML estimate for X is small for the reasons discussed in the 
application section. Although, we discussed CIs for the linages' 
frequencies, these are somewhat less interesting, unless one focuses 
on the prevalence of a particular linage. Otherwise one should 
derive confidence regions on the simplex for the lineage 
frequencies, which is done as outlined, but numerically more 
demanding. 

To test the ML estimate against other parameter choices 
typically three statistical tests are used, the likelihood-ratio, the 
Score, and the Wald test. The latter two are based on the 
asymptotic CIs, while the likelihood-ratio test builds upon the 
profile-likelihood-based CIs. Motivated by our intention to apply 
the methods to malaria we focused on using these tests to compare 



estimates for the parameter X. Namely, several genetic markers 
characterizing linages are typically available (e.g., several micro- 
satellite markers), to all of which the methods are applicable. While 
the true parameter X is of course the same for all markers, the ML 
estimates obtained from them will differ. It is therefore important 
to test whether these estimates differ significantly. The parameter X 
changes on temporal and spatial scales. An obvious question is, 
whether MOI changes over time (e.g. before and after the 
implementation of control measures) or varies across endemic 
regions. Hence, it is important to test for significant differences in 
estimates for X. 

Not surprisingly all tests described perform equally well as they 
are asymptotically equivalent. However, as in the case of CIs we 
suggest to use the likelihood-ratio test if sample size is small or the 
parameters compared are small. If interested in p-values additional 
effort is required for the likelihood-ratio test, because a two- 
dimensional iteration needs to be performed. However, numeri- 
cally this is only as demanding as obtaining the CIs. Because the 
test statistics for the Score and Wald tests can be derived, it is easy 
to derive p-values in these cases. For each of these two tests we 
provided three alternative variants, which all worked almost 
equally well in the provided examples. We should point out that it 
was our intention to indicate only how tests for the parameters can 
be constructed. With the usual approaches one could compare 
multiple parameters at the same time, including the information of 
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all these markers. This however, exceeds both our intention and 
the scope of this article. Finally, as a justification for using the 
CPD, which simplifies the method to a great extent, we 
summarized the test suggested by [20]. Although the test will be 
uninformative if many lineages are present it provides a 
justification for the approach. Of note, the CPD is an intuitive 
assumption if infections are relatively rare events. This does not 
relate with the overall prevalence but rather with how high the 
observed incidence is in a given population in terms of the time 
scale required for the pathogen to complete its transmission cycle. 
Such relationship is hard to establish without complex simulations 
but it is worth noting that there could be biologic scenarios 
(particular pathogens or epidemiologic settings) where this 
assumption does not hold. Thus, it is advisable to check whether 
the CPD assumption is violated using the tests for the model fit 
proposed in this investigation. In our case of study, we observe 
robust estimates across very different epidemiologic settings. 
Overall, the methods developed here can be used to compare 
groups under different exposures, different manifestations of disease, 
groups of patients that have different genotypes (e.g. sickle cell or 
any other hemoglobinopathies associated with protection), or the 
efficacy of a given vaccine. Biologically, this method assumes that 
the rate of evolution of the marker used is "low" relative to the time 
of the infection. That is, there is a "numerable" set of lineages that 
can be estimated and no variants are generated during the time 
scale of one infection. Thus, it is not suitable for pathogens such as 
HIV or any other hypervariable virus. The second assumption is 
that the set of markers used to detect and characterize the MOI are 
effectively neutral, so they are not linked to genes under selection. 
Thus, the loci cannot be associated with antigens or drug resistance. 
As presented, each loci is considered independent, which is a typical 
assumption of genotyping base approximations used in molecular 
epidemiology. We also want to emphasize that this MOI estimate 
depends on the number of detectable lineages given a laboratory 
method. Thus, results from different markers such SNPs or 
microsatellites are expected to differ as a function of their differences 
in mutation rates and mode of evolution. One could actually 
calculate the fit of individual loci and then exclude potential outliers 
if there is any biological reason to do so (e.g. microsatellites under 
different evolutionary models where one is hyper-variable or non- 
variable when compared with others). The method is sensitive 
enough to detect differences in MOI under different epidemiologic 
settings as indicated by the analyses of empirical data. Whereas this 
is not per se a "genomic" method, in the sense that is not designed to 
estimate MOI directly from reads generated from next generation 
sequence (NGS) data, it can do so from a given set of SNPs or 
microsatellites detected by using NGS. Whereas the method was 
originally intended for applications to malaria, it can be applied to 
other parasitic or microbial diseases where the assumptions are not 
violated. E.g. variation on the VNTRs in a multi-clonal infection of 
Mycobacterium tuberculosis. Unlike empirical approaches where 
simply alleles are counted and then averaged, the proposed ML 
method provides a robust and computationally efficient statistical 
framework that can be integrated in epidemiological investigations. 

Analysis 

1 The Model 

1.1 Background. Here, Qj given by (1) is explicitly derived 
under the assumption that K m is given by the CPD (2). Namely, 



sign m=i, 
\m\ =m 

i co iffli +...+m n 

= 1 v V — n mi n mn 

^ ^ m l \-...-m n \ Jl '" ln 

m=\i\ . m - . 1 " 
sign m=i, 

\m\ — m 

e x - \ 2-i„ k=\ m k \ 

sign m — /, 
\m\—m 

e x -\k=\. / —' /! e x -lk=v. 

—n^-iyt, 

e A — 1 k=\ 

where in the derivations the condition i k = 1 indicates that the 
product is taken over all non-zero components of i, corresponding 
to the alleles found in a sample with allele configuration I. 

1.2 Log-Likelihood. Assuming that the number of lineages 
infecting a host follows the CPD (2), the log-likelihood (3) simplifies 
to 

L = L(k,p) = £ m log -— f IT (e** - 1)'* 

i 

= - N log (e x - 1) + m £, ik log ( - e>Pk ~ X) 

i k=\ 
n 

= -N\og(e x -l)+Y / N k log - 1) , 

k=\ 

where 



N k=Yl krti = ikHi 

i (€{0,1}" 

is the number of samples that contain allele A k . Notably, 

n 

N/c>N with equality only if all samples are single infections. 
k=\ 

1.3 Proof of Remark 1. The proof of Remark 1 is as follows. 
Proof of Remark 1 . First, note that 

e A Pk-\ 
urn— - — — =p k . 
x^o e A -l 

Moreover, using de l'Hospitals rule we see that 

" J-Vk — 1 " 

lim L = lim V N k log -— - (N- V N k ) log (e x - 1) 

A — >U A — >0 , , Q 1 , , 

k=\ k=l 
n n 

= ]T N kPk - (N - Y N k ) lim log (e x - 1) = - oo , 

k=i k=\ 



E 
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because N < Nk (note that this holds also true if p k = 0 for 
k=l 

some k). This proves that X = 0 is not a maximum likelihood 
estimate, which is quite intuitive. □ 
1.4 Derivatives of the log-likelihood. Assuming the CPD 
(2) the log-likelihood function is given by (4) and the derivatives of 
(5) are hence straightforwardly calculated to be 



8A BL „ e x -A „ p k e kp k 



BX BX 



+ y Nk J^L^, ( 39a 



8 A w Xe'fk 
opk e p k - 1 



OA 
Jp 



= l-& 



BL _ Xe^k 
8pk~ k e x Pk-\ ' 



and since L does not depend on p 



dL 



(39b) 



(39c) 



(39d) 



(39e) 



The entries of the Hessian matrix (7), i.e., the second derivatives 
of A, given by (5), are calculated to be 



B^A 
BP 2 



= 0. 



(40g 



2 Proofs of the main results 

2.1 Existence and uniqueness of the ML estimate. First, 
the result showing existence and uniqueness of the ML estimate in 
the generic case is proven. 

Proof of Result 1. Assume X^O, as this cannot be the ML 
estimate according to Remark 1. Equating (39b) to zero yields 
Xe Xpk 

Nk — j- = P for all k. Substituting this into (39a) and setting the 

Xe l 



equation to zero yields P = N- 

Xe^ Ar Xe x 
Nk— t =N— — - or 



Therefore, we obtain 



e lVk — 1 



e A -l 



Pk= -jlog[ 1 



Nk 
N 



(1- 



(41) 



proving the last assertion. Hence, it remains to prove the 
statements for X. 

By using (41) and equating (39c) to zero, we obtain 



Nk t 
N 



(1 — e ") I , which is equivalent 



to 



f(X): = ;. + ^io g (l-^(l- e -M=o 



N 



(42) 



Therefore, the ML estimate is a solution of (42). Straightforward 
calculation gives 



a 2 A d 2 L Ar e" ple"''k 

— t = — t=N— t-> N k — 1 i 4° a 

BX 1 BX 2 (e x -l) 2 fri (e^Pk-l) 2 ' v 



2 e x Pk 



d 2 A d 2 A 8 2 L 8 2 L 



Nk^z 



e x Pk 



Xpk 

BAdpk ~ BpkdX ~ 8X8p k ~ 8p k dX ~" K e^k - 1 V' " e^k - \ 



,(40b) 



and 



Nk -i 

*-n-^(i- e - 4 ) 



8 2 A _ 8 2 L _ -NkX 2 e x Pk 



(40c) 



N k/ , N k 



/"«=£ 



N 



(1 



N 



)e~ 



k=\ 



8 2 A 8 2 L 



BpkBpj BpkBpj 



= 0 iork^j, 



B l A B l A 



8p k Bp BpBpk 



■ — 1 for k =£ j, 



8 2 A 8 2 A 



8p8X 8XBP 



= 0, 



(40d) 



(40e) 



(40f) 



Note that, /(0) = 0 and /'(0) = 1 - £ <0, because 



jt=i 



N 



J2N k >N. Hence, /(A)<0 near zero. Note further that 
fc=l 

lim /(I) =oo. Hence, /(/.) = 0 has at least one positive solution. 

Since, Nk<N for at least one k, f"(X) > 0, implying that f(X) is 
stricdy convex for X>0. Because / is strictly convex there can be 
at most one positive solution X of f{X) = 0. Moreover, / is stricdy 
monotonically increasing for X > X. 

The solution can be found by a Newton method. Because f{X.) is 
stricdy convex and monotonically increasing for X > X, the Newton 
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method converges monotonically to the solution k. Moreover, 
because /"' is continuous, the rate of convergence is at least 

quadratic. Noting that k t+ i =k t — yields (9) completes the 

/ yk t ) 

proof. □ 
The special case, in which only single infections occur, is 
summarized by Remark 2. It can be proven as follows. 

Proof of Remark 2. Examining the proof of Result 1 yields that 
that the ML estimate is any positive root oif(k) = 0. In the present case 
f(0) = f'(0) = 0. However, since Nk<N must hold for at least one k, 
f is still strictly convex. This implies that f(k) > 0 for all k > 0. Hence, 
no maximum likelihood estimate with k > 0 exists. 
Moreover, since 



lim L= lim ViViloge^-l^-l-CiV-ViV/Jlog^-l) 

A-»co X->co^ — ' £ — ' 



k = \ 



k = l 



= ^jVfclogl-0=-oo 

k=l 

the ML estimate can only be attained at k = 0. 

In the limit k-*0, one obtains, as in the proof of Remark 1, 



lim L = lim N k \ogp k , 

X— >0 X — y0 , , 
k= 1 

N k 

which is maximized at pk = . Particularly, the likelihood 

function is finite in this case. □ 
In the other non-generic situation, every lineage is found in all 
samples, which is described in Remark 3 and can be proven as 
follows. 

Proof of Remark 3. The proof of Result 1 yields 
f(k) = k{\ — n). Hence, f(k) = 0 has no positive solution, and 
hence no ML estimate with k>0 exists. Clearly, Remark 1 states 
that k = 0 is also not an ML estimate. 

In this case the log-likelihood function simplifies to 



L{k,p)= lim -7Vlog(e A -l)+ ViVlog(e ; «-l) 

A->oo f — ' 



Since L = Q implies that the likelihood is one, this limit case, 
which is - of note - independent of the allele-frequency 
distribution, is the maximum likelihood. □ 

Remark 4 states that the expected number of samples 
containing a given lineage equals the observed number of samples 
containing this allele if the ML estimate is the true parameter. The 
proof is as follows. 

Proof of Remark 4. The maximum likelihood estimate 



satisfies VA(0,/?) = 0. Equating (39b) to zero yields Nk - 



P 



e Xpk _ i 

for all k. Substituting this into (39a) and setting the equation to 



zero yields p = N - 



ke 



Therefore, we obtain 



ke Xpt ke x \-e~ Xpk 

N k —, t=N—. — - or Nk=N— Hence, it remains 

K e Xpk-\ e >-\ l-e~ x 

to be shown that Nk = EiVjt holds. 

ft; 

In the following we will use that Q/ = E — . To simplify the 
notation assume k = n. Hence, 

E7V„ = E( Y, '><')= X '« Ewi= X '" NQi 

<e{0,l}"\{0} fe{0,l}" fe{0,l}" 



+ 



n 

fe{0,l}« ' 

n-l 

(jto _l) ^ II (e kp J - Yp 
iE{o,i}"- 1 '' 

(n-l 
(e^n-X-X) Y ri(e^-l)'> 
fe{0,l}"- 2 ''~ 

Y: n ( ^-i)A 

(e Xp « - l)e Xp «- 1 V" II (e Ap ) - 1Y> . 



N 

N 
e x -\ 

N 
e x -\ 



n n 



= -N log (e x - 1) Y,Pk + X ^ lo g ^ Pk ~ l ) 

k=l k=\ 

» e *Pk-l " l-e-^k 

Taking the limit A— >co yields 



lim 

X->OD 



L(k, P )=\^Y m °S ( [_ e e -xy k 

" l- e ->-Pk 
= > iVlog lim — j-7- 

n 

= ]Tiviogi=o. 



k=\ 



Successively repeating the last step gives 

EN n = ^-—(e Xp "-l)Tle Xp J 

e A — \ j=l 

= -i — r — i Lle Ap J = - — - — . « 

e"- — 1 e kpn j=i e' — l e Ap " 

\_ e ->Pn 



--N 



l-e- 



Since the alleles can be arbitrarily labeled, we obtain 

l- e - x Pk 
EN k = N- 



l-e- 



(43) 



The proof is completed by noting that EL is obtained from (4) 
by replacing Nk with EN k . 
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2.2 Profile likelihood based confidence intervals. The 

existence sand uniqueness of the profile-likelihood-based confi- 
dence intervals are proven as follows. 

Proof of Result 2. The proof consists of several parts. 

n 

Part A: Existence in the generic case. We first assume N k > N 

k=l 

and N k # N for at least one k and prove the CPs existence. 

The CPs bounds satisfy (12). The equations — — =0, yield 

Spk 

P = N k k— r, or 



einPk — 1 



Pk=-jlog[ 1- 



NkX 



(44) 



which implies that f}>N k A must hold for all k. Since, ^S~\pk = 1> 

k=l 

by summing up the above expression one arrives at 
1 = — -V^logO 1— ). Thus, for fixed X the Lagrange 



k=\ 



multiplier ji is a zero of the function 



^)=l + ^log(l-f). 



(45) 



Its derivative is given by 



(46) 



Hence, g is strictly monotonically increasing in /?, and 
consequently has at most one zero P(X). Note that 
lim g x (fj)=-co and lim gx(fi)=l. Hence, g^(j8) = 0 has 

/i^maxA^A /f->oo 

exactly one solution P(X). Furthermore, according to the implicit- 
function theorem, fi(X) is a continuously differentiable function of 
X. 

The likelihood function (4) can be rewritten as 



L(X,p) = - N £ p k log (e A - 1) + N £ ^ log - 1) 



(47) 



Note that 



(**-V*r_ lim eW ^-e-^W 

™ (ei_l)f/t ™ (1 -«-*)"* 

1 if N k = N, 
0 if N k <N. 



Since N k <N for at least one fc, it follows that 



lim L(X,p) = - co 



(48) 



for any arbitrary but fixed allele-frequency vector p. Moreover, the 
proof of Remark 1 reveals that 



lim L(X.,p)- 



(49) 



Now, for any X, let L(X,P(X)) : =L(X,p) with p given by (44) with 
P = P(X). 

Next, we show indirectly that lim L{X,fi(X)) = 

lim L(X,p(X))= - co. 

A-»oo 

First, assume lim L(X,B(X))^ —go. Hence, there exists a 

a->0 

sequence (/.„), with lim X n = 0 but lim L(X n ,P(X„)) # — oo. 

«-*oo «->oo 

Hence, 3 ^4 > — co such that for a subsequence A Ki , 
lim L(X„ k ,P(X„ k )) = A. Without loss of generality, 

lim L(X„,P(X„)) = A. Let be the corresponding sequence of 

allele-frequency vectors. Since the simplex is compact, there exists 
a convergent subsequence p nt —>P- Because L is continuous, it 
follows that lim L(X,p) = A> — go, contradicting (48). 

Analogously it is shown that lim L(X,P(X)) = — oo. 

A-*oo 

Since L(X,p) — I* > 0, as well as L are continuous, and 

lim L(X,/}(X))-l* = lim L(X,p(XJ)) — l* = —co, there exist 

Ai <A<^2, such that (Xi,p(Xi),p) is a solution of (12), where /> is 
given by (44). This proves the existence of the CPs bounds. 

Part B: Uniqueness in the generic case. Next, the uniqueness of the 
confidence intervals is proven. Assume two values X\ < X 2 with 
L(Xi,p(Xi))-l* =L(X 2 ,p(X 2 ))-r=0. Since L{X,P(X)) is continu- 
ously differentiable the mean value theorem implies that there 

exists X\ < X < X 2 with — (X,p(X)) = 0. Application of the chain 
aX 

rule yields § (W)) = | (W)) + 

By definition of /?(!), the relation ^ (X,P(X)) = P{X) holds. 

opt 



8X 



k=\ 

PI fj " 

= j J (xm)+m 7r) J2pM( x » 



= ^-(X,P(X)) + P(X)^l = Thus, 

0=^(XM))=^W))= 8 ^(IP), where /, is given by 

(44) with P = P(X). This implies that (X,p,P(A)) is a zero of (39), or, 
in other words, that X is a maximum likelihood estimate. Because 
of its uniqueness X = X, and X\<X<X 2 . Hence, X\<X 2 <X or 
A<Ai <X 2 is impossible, and the CI is therefore uniquely defined. 
Part C: Existence and uniqueness in the non-generic cases. In the case 

n 

J2N k = N the same proof holds with obvious modifications. As 

fc=l 

n 

(49) is violated and becomes lim L()^,p)= ^^N k \ogp k > - co. 

* k=i 
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It follows that at least one solution of (12) exist. The above proof of 
uniqueness, implies that this is the only solution. 

Similarly, for N/ t =N for all k, (48) is violated and becomes 
lim L(X,p) = 0, from which the existence of exactly one solution 

of (12) follows from the same proof as in the generic case. 

Part D: Derivation of the CIs in the generic case. Parts A and B reveal 
that the bounds of the CPs bounds are the two solutions ((X,p) and 

(I J)) of the equations L(X,P) - I* = 0 and g>{P) = 0, where gx(P) is 
given by (45), and L(X,p) = L(X,p) with p=p(P) given by (44). A 
littie algebraic manipulations yields that L(X,fi) is given by (16e). 

The solutions can be found by a Newton method. Straightfor- 
ward calculation gives 



dL-l* „ e l 
-N- 



dx 



e A -l k 



Clearly, since A>0, detM = 0 if and only if N- 



e A — 1 

According to the proof of Result 1 this condition is only fulfilled at 
the unique ML estimate. Hence, detJWVO in (k,fi) and (I,j8). 
Therefore, the Newton method converges quadratically for any 
initial value sufficiendy close to the respective solution. □ 
2.3 Asymptotic confidence intervals. 

Proof of Result 3. This proof is slightly more general than 
necessary as we will re-use part of it later. 

First, consider a matrix M = (my) with the following structure 



with 



M-- 



B' 



(50a) 



8gA 

dX ' 



dL-l* _ 1 



N k X 



= -A\-g>.(P))-^A, 



Sg>. = 1 
dp lp 



A, 



where A = A(X,P) and gxifi) are given by (16c) and (45) or (16d), 
respectively. Hence, the Newton method leads to the following 
iteration 



a 0 
0 0 



B- 



, and 



D = diag(d u ...,d n ). 

Let M = (mij)jj = M~ l . We aim to derive m\\. We do so by 
inverting M blockwise. Namely, 



M~ 



(A-BD- , B T r 1 BD- 1 



(A — BD~B)~ 

D-'B T (A-BD- , B T y l D- 1 +D- i B T (A-BD- i B T y l BD- i 



The formulae applies whenever, di # 0 and the 2x2 matrix 
A — BD 1 B T is invertible. Moreover, 



X l+ \ 



/8L-1* dL-l*\ 

dx dp ' 

dgA dgx 

V dx dp J 



L-l* 



(MM 



Due to its relatively simple form, the above matrix can be easily 
inverted and the iteration can be rewritten as (16a) and (16b). 

The Newton methods converges locally quadratically if the 
above matrix is nonsingular in the solutions. Part A of the proof 
reveals that these solutions satisfy g^(p)=l, yielding 

-^Y = —^{k — A). Hence, the matrix simplifies to 

OA X 



A — BD~ l B T = 



a n an 
a 2 \ a 2 2 



(A-BD- l B T y 



1 



(51) 



where au=a—S^^, 012 = 021 = -^^, and a 2 2 = 
11 1 

— — . Its inverse is given by 



a 2 2 —an 



a n a 22 -a\ 2 \ -an a n 



( N ' 



M-- 



' + Ia 



k 



e*-l ' X 2 
\{X-A) 



Therefore, 



detM = 



)}p 



N- 



Xe x 



xp A J 



1 X 



A A 



Hence, the desired quantity m\\ becomes 

( 

t 

r[<h 



m n =(M-\ 



1 



fl n -a\ 2 /a 22 



" h 2 

„ V; * 1 



\ 



EC 

k=\ ) 



We are now ready to derive the confidence interval given by 
(18). To derive (J^ x (»)) n we first note that (7), (40) and 
rearrangement of the parameters imply that the Fisher informa- 
tion matrix has the form (50), with 
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8 Z A 



-(*), b k - 



8 Z A 



- (3) and d k = 



8 2 A 



81 1 8 Pk 8X y "' "* dp 2 



(9) 



given by (40), and (J N 1 (9)) n corresponds to fh\\. Therefore, 

Xe Xp k \ 

-(l--sr-r) 



"vft? N k X 2 e x Pk\ e x Pk „ tetek ■ : 



-iyj < 

X 2 V e%-l 



e^Pi _ 1 v g^p* _ 1 

2 



(52) 



and consequently 



an : 



{e A -X? (e^k-l) 2 ^ X \ e x Pk-l 



e k ?k — 1 



(^-i) 2 i 2 ^ V 



Moreover, 



By inspecting (40), it becomes clear that all derivations remain 

_l_ e -4f>* 



unchanged with JVjfc replaced by E N k = N 
This gives 



\-e->- 



(cf. eq. 53). 



(e A -l) 2 (e A -l)A 2 



Ne x 
(e--l)x 2 ~ 



(2X + n-Y,e Apk )- 
k=\ 

(X+n- J2 eXpk ) 2 



\- 1 



/ Mr 

I 



+ 21 + 



k=l / 



which simplifies to 



e x Pk - 1 V e^fc - 1 



iy (e ^_ 1) f 1 _^_V-+-- 1 y^ 



and 



Al _ 1 A (e x Pk - l) 2 
t^Jk~J}^ N k e x Pk 



{ ( 



-Ne A 



e x -\ 

V v 



+ - 



n-J2e x Pk 



n 



(54) 



Substituting the above with $ = # into (18) - using the fact that 

In(&) = Jn($) - yields (20) after after a litde algebraic manipula- 
tion. 



The identities e Xpk = l + - 



(l-e x )EN k 



N-(l-e x )EN k 
Substituting this into (54) gives 



follow from (43). 



Hence, 



(V(3))n = 



-y 2 Y^N k e x Pk(\ 



e A Pk - 1 



(53.) 



( -Ne x 
ie x -lf ^ 

n 

(X + n-^e Xp kf 

J k=\ 

X 2 ^{e x Pk-\f 

V hi N k eXPk 



Deriving (I N (9)) n is easy. Namely, exacdy the same 
calculations hold with 



-S 2 A. 



a=-E^(S), b k =-Ep±-(!f) and d k = E"—^(9). 

8a 1 dp k dX 8p% 



.8 A 



(V(*))n = 



-Ne x 



( 



(^-l) 2 



J 



(55) 



Substitution of the above evaluated at & (using the fact that 
N k = EN k ) into (18) yields (21) after some rearrangement. 



□ 



Proof of Result 4. To simplify the notation, we first derive the 
formulas for the confidence interval of p n . By re-arranging the 
parameters as in the proof of Result 3, it is obvious that the matrix 
M given by (50) can be used instead of the Fisher information Jff 
(or/jv). Particularly, (M _1 ) B+2>JI+2 = (J J 7 1 ) B+1>J , +1 . 

We can apply a blockwise inversion formula to M similar as in 
the proof of Result 3. Namely, 
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M~ l =\ 



X d n 



(M\ —B\ d~ 1 Bf ) " 1 - Mf 'Si (d n - BfMf 1 Bi ) - 
- d~ 1 B\(M\ - B, rf- 1 Bf )- 1 (d n —Bf Mf 1 Bi)~ 1 



where 



h- 

4 = 



Ne" . -1 , ^ Pi 
e>-Ve 

Ne 



V-l f-;e A «-l 



e ^ _ 1 g*p k - 1 

N)}e x 1_ 

e 1 - 1 e Xp k - 1 ' 



A B 
B T D 



with 



A = 



and Bf = (b„, 1,0, ...,0), 



:)• H h : 

Z) = diag(fi?i, . . . , d„-i). 



b„-i 
1 



and 



Therefore, 



an = 



Ne* f -1 ^ /? 2 



; ; - 1 I <? A - 1 + ^e^* - 1 



-H - 1 



Ne>- 



1 J -p B , n - 1 1 



Clearly, 

(^" 1 )„+2,„ + 2=(^-<^r 1 J Bi) _1 
i 

d n -b 2 n fhn -b n m\2-b n m 2 \ -m 2 2 ' 

where are the elements of Afj - 1 . The inverse of M\ is 
calculated exactly as the inverse of M in the proof of Result 3. 
Namely, we arrive at 



012 



Hence, 



mn W12 

W21 W22 



1 



«22 — «12 



ana 22 -af 2 V -«12 a\\ 



-b n a 2 2 + 2b n an-an = 



n-l . 



with ci\\=a— > -p , a\2 = — > — , and a 2 2 = — > — . 
Hence, the desired quantity wn becomes 



1 )„+2,„ + 2= .2 — 

a n a 22 -af 2 



. 2(1 _ M 4 V_lz^_"^l + ly e ^ 
e*"-lyl A ;. 2 x 2 f^ 



To derive the desired quantity ft i / 1 1— />„ «— 1 1 y4 ^ 

( = (VWWi,„ + i) we need to set B k = V X *=» / 

CM, 



5 2 A , a 2 A 

and a/t = — r— t" ■ By using (40) and (43) we obtain 



e A - 1 I (e^« - l) 2 £ri e A P"-l e 1 - 
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Moreover, 

2 

a\\a 2 2 — a n - 



□ 

l-p„ n-l 1 ^ ■ n-l 1 ^ x \ 2.4 Testing the Parameters.. Proof of Result 5. The 



result is proven by showing that the iteration (29) leads to the 
profile-likelihood with X = Xo- The proof of Remark 2 reveals that 
the desired values for fl is the unique zero of gA 0 (fi) given by (45). 
The zero can be found using a Newton method. Combining (45) 
-Pn n ~ 1 1 Xp k \ an <3 (46) yields (29) after a little rearrangement. 

* v *h e ) 

□ 



1 , , l~Pn \ / n-l 1 



+ 2- 



' -V + 2 (l^(_^ .y> 

X A fc=l 



X 



X 



Combining the above yields, 
b\an + 2b n a\i-a\\ _ 



d„ + 



JVlV 



a u a 22 -at 



2p„{\-p„) 1 \ 



(l -n+V: e»*)- APn l , + ^r-r 

1 (e^n-l) 2 ^ e 1 -! 



e 4fti — 1 



V 



and finally 



Cat ')»+i,n+i - 



g (1 _, + g^)- ggg-gJ + ' \ 



gipn — 1 



V 



e A -l 



Remark 6. if/io and p 0 (and ft 0 ) are the true (unknown) parameters, 
the asymptotic j^\x 0 ,m 0 ~N(0Jx a i ts - I^J^I^J holds. 

We aim to test only for Xo, so any choice can be made for the true parameter. 
However, the parameters p 0 occur in the asymptotic variance 
IloXo ~ h-o/i^I/t^I/igAo- Hence, we need a plug-in estimate for the asymptotic 
variance. There are two possibilities. First, the true parameter p 0 is replaced by 
the profile-likelihood estimates p 0 based on Xo and the asymptotic variance by 



Here, either the expected or the observed Fisher 



information can be used. 

Second, both Xo and p 0 can be replaced by the ML estimate (X,p). In this 
case the expected and observed Fisher information coincide. 

Proof of Result 6. The remark is proven by explicitly deriving 
the test statistic. To simplify the notation we write X and p for Xo 
and p 0 , respectively. To derive J ii — J ^J^p J n>. (or 
hi.— hftlf^ Iftk) we can follow the proof of Result 3. 

From the blockwise inversion formula (51) the relation 



1 



(VW)n 



(56) 



follows immediately, where the denominator on the left-hand side 
is given by the reciprocal of (53). 

8L dA 

Noting that — = — given by (39a) one obtains 



dx dx 



-N- 



Pte Xp 



Substituting this and (56) in 



e A — i f— ' "e^t — 1' 

k— 1 

the test statistic (31), and writing Xo and p 0 for X and p gives (32). 
Of course, (56) also holds if Jn is replaced by In, where 
I is given by (54). Thus, the same reasoning as above 



yields (33). 



□ 



Hence, the bounds of the confidence intervals are given by 



1 e x -\ 



2 X V Ne x 



(1 -n + £ e x Pk)- 2 Pf-Pn) + _L 



e ifn _ 1 



k = l 

L-(l-„+ £ e %)-(l-pJ 

k 



e X Pn-\ e X -\ 



By replacing n by j, one obtains the confidence interval of pj 
given by (22). 



3 The case 1 = 0 

3.1 Log-likelihood. In the limiting case that the true 
parameter is X = 0 the conditional poison distribution becomes 



1 for m=l, 
0 for m > 1 . 

Following the derivations in subsection 4. 1 , Q\ becomes 

_(Pk for i = e k , 
Qi ~\ 0 else, 



(57) 
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where e k denotes the kxh base vector. Hence, the likelihood 3.2 Derivatives of the likelihood function. 

function (3) becomes 



L = L{Q,p)=Y j N k \ogp k . 



(58) 



fc=i 



dX a^o X 



" 1 e >n - 1 

= > hm-log— 1 — - 

^ i^X b p k {e x -\) 



This is the limiting case of (3) for A— >0. Furthermore, we can 
conclude the following. 

Remark 7. If the true parameter is Xq = 0, according to (57), an 



Applying de l'Hospitals rule gives limlog — — -. — — =0. 

A->o p k \e A) 

Hence, successive application of this rule to the above yields 



observation X = (Ni , . . . ,N„) with N k >N is impossible in a sample SA(0,p,P) _ . . p k e 

t^i ex f-i ™ 



%( e A _ i) _ ( e % _ l)e*p k (e x - 1) 



of size N. Hence, N k = N with probability one. 



k=\ 



Assume Xq = 0 is the true parameter. Then, we can assume 



e^k - 1 



= YN k )im Pk , - — 

^ K A^ol- e -%- 1- 



L(X,p) = . 



Y,N k log 



^p* - 1 



for X>0 



e x -\ 

Y N k\ogp k for A = 0, 



k =i 



As mentioned above, the case X = 0 is just the continuation of 
the likelihood function. 

n 

Hence, we can define A(X,p$) : = L{X,p) + P{\ - Y^Pk)- 

fc= l 

Moreover, the (one-sided) derivatives of the likelihood function 
A exist in X = 0. We have, 



= ViV lim Pk(i-e- A )-(l-e- A "k) 



ViVfclim 



p k (e- 1 -e- 1 Pk) 



\ ^0p k e- ? Pk +e -A_(i +p k ) e - x ( 1 +Pk> 



V N k lim 



-p k (e- x -p k e- x Pk) 



o -p\ e - lf k - e ~ x + (l+p k ) 2 e-^ 1+p k) 



8A(0,p,p) _ lim 8A(X,p,p) = J-^Pk-l 



A^O 5/1 



(59a) 



-p^-p^ 



dA(0,p,P) dA(X,p,P) N k 

= hm — = p, 

dp k a^o dp k p k 



(59b) 



, , , , dA(0,p,p) ,. dA{X,p,p) 
JNote, that the last steps also proves t-, = lim — . 

dX a^o dX 

Similarly, from (39d) 



,. 8A(X,p,p) ,. Xe lp k 
lim — = lim N k - 



A^o dp k A^O e Xp k — 1 A^O 



lim N k 



e'-Pk+Xp^Pk N k 



Pk 



e x Pk 



Pk 



8A(0,p,p) dA(X,p,P) ^ 

- hm — = 1 — y p k . 

fc=i 



dp 



a^o dp 



(59c) dA(X,p,P) N k 

Since from (58) r = — , one obtains (59b). 

dp k p k 



The proof is found in the next subsection 6.2. 

From (59a) we immediately see that — ^ /f — < 0. Hence, the 

OX 

ML estimate X = Q (cf. Remark 2) is a boundary maximum. 
However, it is necessary for the asymptotic distributions (1 1), (17), 
(30), and (38) that all derivatives of the likelihood function vanish. 
As this is not the case, we can neither derive confidence intervals, 
nor test the parameters in the case X = 0. 
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